A Robust Model for Intelligent Text Classification

نویسندگان

  • Roberto Basili
  • Alessandro Moschitti
چکیده

Methods for taking into account linguistic content into text retrieval are receiving a growing attention [16],[14]. Text categorization is an interesting area for evaluating and quantifying the impact of linguistic information. Works in text retrieval through Internet suggest that embedding linguistic information at a suitable level within traditional quantitative approaches (e.g. sense distinctions for query expansion as in [14]) is the crucial issue able to bring the experimental stage to operational results. This kind of representational problem is also studied in this paper where traditional methods for statistical text categorization are augmented via a systematic use of linguistic information. Again, as in [14], the addition of NLP capabilities also suggested a different application of existing methods in revised forms. This paper presents an extension of the Rocchio formula [11] as a feature weighting and selection model used as a basis for multilingual Information Extraction. It allows an effective exploitation of the available linguistic information that better emphasizes this latter with significant both data compression and accuracy. The results is an original statistical classifier fed with linguistic (i.e. more complex) features and characterized by the novel feature selection and weighting model. It outperforms existing systems by keeping most of their interesting properties (i.e. easy implementation, low complexity and high scalability). Extensive tests of the model suggest its application as a viable and robust tool for large scale text classification and filtering, as well as a basic module for more complex scenarios.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Intelligent and Robust Genetic Algorithm Based Classifier

The concepts of robust classification and intelligently controlling the search process of genetic algorithm (GA) are introduced and integrated with a conventional genetic classifier for development of a new version of it, which is called Intelligent and Robust GA-classifier (IRGA-classifier). It can efficiently approximate the decision hyperplanes in the feature space. It is shown experime...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Palarimetric Synthetic Aperture Radar Image Classification using Bag of Visual Words Algorithm

Land cover is defined as the physical material of the surface of the earth, including different vegetation covers, bare soil, water surface, various urban areas, etc. Land cover and its changes are very important and influential on the Earth and life of living organisms, especially human beings. Land cover change monitoring is important for protecting the ecosystem, forests, farmland, open spac...

متن کامل

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...

متن کامل

An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification

In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...

متن کامل

Intelligent Auto pilot Design for a Nonlinear Model of an Autonomous Helicopter by Adaptive Emotional Approach

There is a growing interest in the modeling and control of model helicopters using nonlinear dynamic models and nonlinear control. Application of a new intelligent control approach called Brain Emotional Learning Based Intelligent Controller (BELBIC) to design autopilot for an autonomous helicopter is addressed in this paper. This controller is applied to a nonlinear model of a helicopter. This...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001